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Japanese Laid-Open Patent Application Sho 61 - 107434 
SPECIFICATION 

Title of the Invention: 

DATA PROCESSOR 
Scope of Patent Claims: 

Claim 1 

A data processor, wherein, in a data processor that has a cache memory, which maintains a 
main memory where programs and data are stored, or which maintains a main memory and 
its copy; and that simultaneously processes multiple instructions comprising the programs; 
and that starts processing an instruction using the processing results of a conceptually 
precedent instruction, or depending upon the processing results, without waiting for the 
processing results of the precedent instruction, it is provided with: 

a means where, in the case of processing by predicting a storage instruction, whose 
operational results are stored in the main memory, it indicates that the storage instruction is 
in the prediction state until a determination can be made as to whether or not the prediction 
is correct; a storage data maintaining means that maintains the result of an operation of the 
storage instruction and its storage address until whether or not the prediction is correct is 
determined; and 
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a control means that changes the indication by the prediction state indicating means into a 
non -prediction state; concurrently, that writes the result of an operation using the contents 
maintained by the storage data maintaining means, into the main memory or the cache 
memory in the case that the prediction is correct; and 

another control means that invalidates the data of the storage instruction maintained by the 
storage data maintaining means according to the incorrect prediction, in the case that the 
prediction is incorrect. 

Claim 2 

The data processor according to Claim 1 , wherein, the prediction of the storage instruction 
is the prediction of the results of the operation of the precedent instruction. 

Claim 3 

The data processor according to Claim 1 , wherein, the prediction of the storage instruction 
is a prediction of a branch decision result of a branch instruction. 
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Claim 4 

The data processor according to Claim 1, wherein, the prediction of the storage instruction 
is a prediction of a branch target instruction array data of a branch instruction. 

Detailed Description of the Invention: 

[Application Field of the Invention] 

The present invention relates to a digital computer, and particularly relates to a storage 
processing method in a data processor for the purpose of increasing the processing speed, 
where before the completion of the execution of a precedent instruction, the results are 
generally conceptually predicted; and simultaneously, successive instructions are executed 
in parallel. 

[Background of the Invention] 

Conventionally, in a general-purpose large-sized digital computer, for the purpose of 
increasing the processing speed, it has become common to simultaneously process 
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multiple instructions, such as a pipeline method or a parallel processing method. As this 
example, there are the IBM 3033, which is mentioned in the "Internal Design and 
Performance of IBM 3033 Processor", "General-purpose Computer", Nikkei Electronics, p. 
251 - 263, and the IBM 360/91 , which is mentioned in "An Efficient Algorithm for 
Exploiting Multiple Arithmetic Units", IBM Journal, Jan. 1 967. In the [IBM] 3033, for 
the purpose of increasing the processing speed, in the processing of a branch instruction, it 
has adopted a method where in order to start processing of successive instructions before a 
decision about a branch is made, whether or not the branch is realized is predicted, and the 
successive instructions are processed in the prediction state until the branch decision is 
made. 

In the meantime, in the [IBM] 360/91 , a method is adopted where multiple computing 
elements that can independently perform instruction operations are established for the 
purpose of increasing the processing speed, and as soon as input operands are gathered, 
even if it is a conceptually successive instruction, the operation immediately is started. 

In order to accomplish an even higher speed, it is desirable to adopt both the prediction 
processing method and the parallel operation method. However, in this case, the following 
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problems occur. In other words, in a case where the processing of a storage instruction is 
performed in the prediction state, and after the results have been written into main memory, 
if it turns out that the prediction is actually wrong, it is not preferable for two reasons. The 
1 st reason is because the storage instruction should not have been executed, so it becomes 
necessary to restore the contents of the main memory to the state existent before writing 
was performed. Therefore, it is necessary to establish a control logic, and there is a 
possibility where it may prevent increasing the processing speed because an extra time loss 
is generated due to the restoration. The 2 nd reason is because, in the case that multiple 
CPUs (central processing units) and channels share the main memory, there is a possibility 
that results which have been improperly written by the above-mentioned storage 
instruction, may be read out by another CPU or channel before the CPU completes the 
restoration of the main memory. However, this is not allowed in many cases. 

[Objective of the Invention] 

The objective of the present invention is to provide a high-speed instruction processing 
method where in a data processor, where both the prediction processing method and the 
parallel operation method have been adopted, there is no problem such as that of the prior 
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art. In other words, any restoration operation due to improper writing into the main 
memory by a storage instruction during the prediction state is not required, and any wrong 
content in the main memory will not be read out by other CPUs and channels. 

[Summary of the Invention] 

In the present invention, one or more storage buffers are established to maintain data and 
address information to be stored, and a state flag that indicates whether or not the 
processing of the storage instruction is in the prediction state is provided to each storage 
buffer. In processing the storage instruction, the processing is continued to the operation 
stage based upon the parallel operation method, regardless of whether or not it is in the 
prediction state. However, in the stage of writing into the main memory, writing into the 
main memory is not performed until the prediction state is resolved, and data generally are 
maintained in the storage buffer. At the point at which the prediction state is resolved, if 
the prediction is incorrect, the storage instruction within the storage buffer is cancelled, and 
if the prediction is correct, writing into the main memory is performed. 
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[Embodiments of the Invention] 

An embodiment of the present invention is explained next. For the convenience of 
explanation, the machine [example] is based on the IBM system 370 architecture. 

Fig. 1 shows typical instruction formats. Fig. 1 (a) is the format of a floating point add 
instruction AD or a storage instruction ST, and the OP part indicates the contents of the 
operation; Ri indicates a register number where a 1 st operand of the instruction is stored; X 2 , 
B 2 and D 2 indicate an index register number, a pace register number and a displacement for 
the purpose of forming a 2 nd operand address, respectively. Fig. 1 (b) is the format of a 
Branch-on-Condition instruction (hereafter abbreviated as BC), and OP indicates that it is a 
BC instruction. Part Mi indicates a mask that indicates a value for a condition code to 
realize a branch; and X 2 , B 2 and D 2 indicate an index register number, a pace register 
number and displacement for the purpose of forming the branch target address, 
respectively. 

Fig. 2 is an arrangement in the main memory of an instruction array where high-speed 
processing is realizable in a data processor using the present invention. The 1 st instruction 
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is an AD instruction, in relation to which a condition code is established based on the 
results of the operation. The 2 nd instruction is a BC instruction for making a decision as to 
whether or not the branch is realized according to the value of the condition code. The 3 rd 
instruction is an ST instruction executed in the case that the branch in the BC instruction is 
not realized. The ST instruction does not have any dependency with the BC and AD 
instructions regarding data, except for a point where whether or not the ST instruction is 
executed is determined according to the branch decision of the BC instruction, where the 
value for the operand register is ascertained sufficiently in advance. 

Fig. 3 is an outline of the entire construction of the data processor where the present 
invention has been adopted. Symbols ' 1 ' and '2' are CPUs; *3' is a main memory where 
programs and data are stored and a main memory device where its control is performed 
(hereafter, abbreviated as 'MS'); '4' is a channel, which controls data transmission between 
an input/output device 5 and the HS [3] [sic]. The CPUs 1 & 2 and the channel 4 are 
connected to the HS 3fsicJ, respectively, and each of them writes and reads out to/from the 
main memory. 

-9- 



Japanese Laid-Open Patent Application Sho 61 - 107434 

The interior of the CPU 1 is comprised of an instruction control unit (hereafter abbreviated 
as 'IU') 6, an operational unit (abbreviated as 'EU') 7 and a memory control unit (hereafter, 
abbreviated as 'SCU') 8. 

The IU 6 issues an instruction read-out request to the SCU 8, decodes the read-out 
instruction via the SCU 8; and issues an operand read-out request to the SCU 8 again. The 
read-out operand is transmitted to the EU 7 via the SCU 8 and the IU 6. When the IU 6 
passes the decoded information of the instruction along with the operand to the EU 7, it is 
set up in an available computing element among established multiple elements E0, El , . . . , 
and the operation is performed. In the case that the instruction is a storage instruction, the 
computing element issues a storage request to the SCU 8, and the storage data and the 
address information are transmitted. The SCU 8 reads out the instruction from the IU 6, 
reads out its operand, receives the storage instruction, and, the address conversion is 
performed if necessary, and then, these main memory references are performed to the HS 
3 [sic]. Other than a normal control circuit, a storage buffer 9 and a storage buffer control 
circuit 10 according to the present invention are established in the SCU 8. 
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Fig. 4 is an outlined time chart when the instruction array in Fig. 2, where the 
above-mentioned CPU 1 is placed on the MS 3, is processed. The horizontal axis indicates 
the time by unit of machine cycle, and the vertical axis indicates each logical unit/logic 
circuit In the present chart, the time zone when each instruction is processed in each 
logical unit/logical circuit is expressed by surrounding a mnemonic of the instruction with 
a rectangular box. Further, in the IU 6, from the time for starting to decode the AD 
instruction to the time for completing to write into the MS 3 by the ST instruction is 
referred to as d, C 2 , . . . , Cfaiegibiej, in respective order. 

Decoding the AD instruction and reading-out of the operand are performed during the Ci, 
and an operation is performed at the computing element E 0 during C2 through C5. 
Although not shown in the present time chart, the results are written into the floating point 
register and a condition code is written into the PSW during the Cfuiegibiej- Further, the BC 
instruction is decoded during C2, and is set up to the computing element Ei during C3. 
However, processing the precedent AD instruction has not yet been completed, so the 
branch decision which should be performed using the condition code established by the 
AD instruction, cannot be made, and it is deferred until C5. The condition code of the AD 
instruction is established during C 6 , so the branch decision of the BC instruction is 
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performed during this cycle. In the present embodiment, it is presumed that branching is 
not realized. In the meantime, the ST instruction is decoded during C3. Originally, the 
branch decision of the BC instruction is not made during C3, so whether or not the ST 
instruction is executed cannot be determined. However, processing the ST instruction is 
started based upon the prediction where the branch of the BC instruction is not realized. 
The ST instruction is set up in the computing element E 2 during C 4 , and the storage data 
and the address information are temporarily stored in the storage buffer during C5. On this 
occasion, a prediction flag indicating the ST instruction to be in the prediction state is set. 
In the meantime, the branch decision of the BC instruction is made during Ce. As 
described above, branching is not in the present embodiment, so the prediction is correct. 
Therefore, the prediction flag of the ST instruction stored within the storage buffer, is reset. 
In addition, the storage data is written into the main memory during C 7 based on the 
address information. If the prediction is assumed to be incorrect according to the branch 
decision of the BC instruction, the ST instruction within the storage buffer is cancelled. 
The operation enables the computing element El to become vacant during C 5 , making it 
usable for the operation of successive instructions. Therefore, when other computing 
elements are in use, it is helpful for increasing the speed of instruction processing. 
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The construction of the storage buffer and the storage buffer control circuit are explained 
hereafter, with reference to Fig. 5 and Fig. 6. 

At first, Fig. 5 shows the construction of the storage buffer. Symbols '501 ' through '502' 
are (m+1) units of the storage data registers and the storage address registers, respectively. 
When the ST instruction is executed in an operational unit, a storage request signal REQ is 
issued from the operational unit via a signal line 503. Simultaneously, the storage data and 
storage address transmitted from the operational unit via signal lines 504 and 505, are 
stored in either of registers 501 through 502. In this case, information about in which 
number of register the storage data and the storage address are stored is indicated with an 
input pointer IP transmitted from the storage buffer control circuit 10 via signal line 506. 
The symbol ' 507 ' is a selector which selects the contents of the storage data register and the 
storage address register in the number indicated by an output pointer OP, which is 
transmitted from a storage buffer output control circuit via signal line 508, and then 
transmitted to the main memory device via signal line 509. 

Fig. 6 shows the construction of the storage buffer control circuit 10. Symbol '601 ' is a 
storage buffer state control circuit, equipped with prediction flags 602 through 603 that 
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indicate whether the storage request is in the prediction state, by corresponding to the 
(m+1) units of registers 501 through 502. In other words, '602' is a request flag REQ, 
which indicates that there is a storage request corresponding to No. 0 register within the 
storage buffer, and is also a prediction flag Po which indicates that the storage request 
corresponds to the ST instruction being processed in the prediction state of the storage 
request. Further, symbol '603' is a request flag REQ m and a prediction flag P m 
corresponding to the No. m register. 

Symbol '604' is a storage buffer input control circuit which receives the REQo through the 
REQ m from the [storage buffer state control circuit] 601 via a signal line 605. It selects one 
from these requests that are in the OFF state, and the number is transmitted to the signal 
line 506 as the input point IP of the storage buffer. The signal line 506 is sent to the storage 
buffer 9 and the storage buffer state control circuit 601 . When an REQ signal is transmitted 
to the storage buffer state control circuit [601 ] from the EU 7 (Fig. 3) via the signal line 503, 
the request flag of the number indicated by the input point IP, is simultaneously set to T, 
and the value for the prediction state signal P transmitted from the EU 7 via the signal line 
606, is set as the value for the prediction flag indicated by the input point IP. 

- 14 - 



Japanese Laid -Open Patent Application Sho 61 - 107434 

Symbol '607' is a storage buffer output storage circuit which receives the REQ 0 through 
REQ m from the [storage buffer state control circuit] 601 via the signal line 605, and 
receives P 0 through P m via the signal line 608, and register number(s), where the prediction 
flag is in the OFF state and the request flag is in the ON state are transmitted to the signal 
line 508 as the output point OP of the storage buffer. The signal line 508 is sent to the 
storage buffer 9 and the storage buffer state control circuit 601 . Further, where a register 
for which the prediction flag is in the OFF state and the request flag is in the ON state, 
actually exists, the [storage buffer output control circuit] 607 issues a storage request signal 
MREQ to the MS 3 (Fig. 3). The [storage request signal] MREQ is transmitted to the [MS] 
3 and the [storage buffer state control circuit] 601 via a signal line 609. When the [storage 
buffer output control unit] 607 issues the [storage request signal] MREQ, the [storage 
buffer state control circuit] 601 resets the request flag(s) and the prediction flag(s) of the 
number(s), which are indicated with the OP, to '0'. 



In the meantime, when the storage buffer state control unit 601 receives a branch decision 
signal BJ transmitted from the EU 7 via a signal line 61 0, if the branch prediction is correct; 
in other words, in the present embodiment, if the BJ indicates that branching is not realized, 
P 0 through P m are all reset to '0'. With this operation, storage request(s) where, because the 
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request flag is in the ON state but the prediction flag is also in the ON state, the MREQ 
cannot be issued, can be issued since the prediction flag becomes in the OFF state. Then, 
writing is performed into the MS [3]. Further, when the BJ signal is received, if the branch 
prediction is incorrect, requests where the correspondent prediction flags are in the ON 
state among REQ 0 through REQ m are all reset to '0*. Simultaneously, P 0 through P m are 
also all reset to '0'. With this operation, the storage request within the storage buffer, 
which should not have been actually executed, should be cancelled. 

The signals REQ, the storage data, the storage addresses and the [branch decision signal] 
BJ transmitted from the operational unit to the memory control unit SCL 8 via the signal 
lines 503, 504, 505 and 610, can be easily formed using prior art technology, so the 
explanation is omitted. Further, the signal P on the signal line 606 is, for example, formed 
as follows. In other words, in the case that the instruction setup from the instruction control 
unit 6 to the EU 7 is the same as in the conceptual execution sequence of the instruction, a 
value for a flip-flop such that when a branch instruction is set up to the EU 7, and 
simultaneously, ' 1' is set, and, when the branch decision is made, simultaneously, '0' is set, 
can be considered as the signal P. 
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Further, the construction of the MS 3 relative to the signal MREQ, the storage data and the 
storage address transmitted from the SCU 8 to the MS 3 via the signal lines 609 and 509, 
and operation when these are received are similar to those in the prior art, so the 
explanation shall be omitted. 

Furthermore, as a prediction in the above-mentioned embodiment, not only the branch 
decision of the branch instruction but also the various below-mentioned ones can be 
considered, to which the present invention is also applicable. In other words, one predicts 
a branch target instruction data in a branch instruction, as mentioned in Japanese Patent 
Publication Sho 54 - 9456, and there is another predicts the result of the operation of a 
precedent instruction in the processing of two instructions where a register conflict may 
occur, as it is mentioned in Japanese Patent Application Sho 58 - 237778. 
[Efficacy of the Invention] 

According to the present invention, in a data processor where both the prediction 
processing method and the parallel operation method are adopted, operational processing 
of the storage instruction is completed even if in the prediction state, and a computing 
element, which has performed the operational processing of the storage instruction, 
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becomes usable for the operational processing of another instruction. Hence, it is efficient 
for increasing the processing speed. Further, according to the present invention, writing 
into a main memory is never performed by an instruction in the prediction state, so a 
restoration operation is not required. Therefore, no processing speed reduction occurs. In 
addition, writing is never performed in error, so it prevents other CPUs or channels from 
reading out the results of storage instructions where its prediction is incorrect. Concerning 
this point, it is generally possible to keep the specifications, such as those of the IBM 
System 370 architecture.. 
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Brief Description of Drawings: 

Fig. 1 is a diagram that shows typical instruction formats; 

Fig. 2 is a diagram that shows an example of an instruction array where the increase of the 
processing speed is realized; 

Fig. 3 is a diagram that shows the entire construction example of the data processor; 

Fig. 4 is a time chart when the instruction array in Fig. 2 is processed; 

Fig. 5 is a block diagram of the storage buffer; and Fig. 6 is a storage buffer control circuit 

diagram. 

3 ... main memory device, 7 ... operational unit, 8 ... memory control unit, 9 ... storage 
buffer, 10 ... storage buffer control circuit, 601 ... storage buffer state control circuit, 604 
. . . storage buffer input control circuit, 607 . . . storage buffer output control circuit. 

Agent: Patent applicant: Akio TAKAHASHI [seal] 
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FIG. 1 
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FIG. 3 
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FIG. 6 
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